AIDA: Identifying Code Switching in Informal Arabic Text
نویسندگان
چکیده
In this paper, we present the latest version of our system for identifying linguistic code switching in Arabic text. The system relies on Language Models and a tool for morphological analysis and disambiguation for Arabic to identify the class of each word in a given sentence. We evaluate the performance of our system on the test datasets of the shared task at the EMNLP workshop on Computational Approaches to Code Switching (Solorio et al., 2014). The system yields an average token-level Fβ=1 score of 93.6%, 77.7% and 80.1%, on the first, second, and surprise-genre test-sets, respectively, and a tweet-level Fβ=1 score of 4.4%, 36% and 27.7%, on the same test-sets.
منابع مشابه
Token Level Identification of Linguistic Code Switching
Typically native speakers of Arabic mix dialectal Arabic and Modern Standard Arabic in the same utterance. This phenomenon is known as linguistic code switching (LCS). It is a very challenging task to identify these LCS points in written text where we don’t have an accompanying speech signal. In this paper, we address automatic identification of LCS points in Arabic social media text by identif...
متن کاملTranslation of Power and Solidarity Pronouns in Qur’anic Rhetoric
Translation of the Holy Quran can be difficult for translators in terms of accuracy and translatability. Sometimes translators fail to render the Quranic thoughts because of the lack of language features in target languages. This results in an unfavorable interpretation. One of the challenging aspects of translating Quran is reference switching as rhetorical devices, which are widespread i...
متن کاملMixed Language and Code-Switching in the Canadian Hansard
While there has been lots of interest in code-switching in informal text such as tweets and online content, we ask whether code-switching occurs in the proceedings of multilingual institutions. We focus on the Canadian Hansard, and automatically detect mixed language segments based on simple corpus-based rules and an existing word-level language tagger. Manual evaluation shows that the performa...
متن کاملAddressing Code-Switching in French/Algerian Arabic Speech
This study focuses on code-switching (CS) in French/Algerian Arabic bilingual communities and investigates how speech technologies, such as automatic data partitioning, language identification and automatic speech recognition (ASR) can serve to analyze and classify this type of bilingual speech. A preliminary study carried out using a corpus of Maghrebian broadcast data revealed a relatively hi...
متن کاملHigh capacity steganography tool for Arabic text using 'Kashida'
Steganography is the ability to hide secret information in a cover-media such as sound, pictures and text. A new approach is proposed to hide a secret into Arabic text cover media using "Kashida", an Arabic extension character. The proposed approach is an attempt to maximize the use of "Kashida" to hide more information in Arabic text cover-media. To approach this, some algorithms have been des...
متن کامل